Overview

Dataset statistics

Number of variables13
Number of observations35441
Missing cells54
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.5 MiB
Average record size in memory104.0 B

Variable types

Numeric4
Categorical9

Warnings

INSEE has a high cardinality: 35441 distinct values High cardinality
LIBGEO has a high cardinality: 33082 distinct values High cardinality
DEP has a high cardinality: 100 distinct values High cardinality
EPCI has a high cardinality: 1263 distinct values High cardinality
loypredm2 has a high cardinality: 5564 distinct values High cardinality
lwr.IPm2 has a high cardinality: 5564 distinct values High cardinality
upr.IPm2 has a high cardinality: 5564 distinct values High cardinality
R2adj has a high cardinality: 2776 distinct values High cardinality
NBobs_maille is highly correlated with NBobs_communeHigh correlation
NBobs_commune is highly correlated with NBobs_mailleHigh correlation
NBobs_maille is highly skewed (γ1 = 37.86081434) Skewed
NBobs_commune is highly skewed (γ1 = 37.96537698) Skewed
INSEE is uniformly distributed Uniform
LIBGEO is uniformly distributed Uniform
INSEE has unique values Unique
NBobs_commune has 11142 (31.4%) zeros Zeros

Reproduction

Analysis started2021-02-18 21:30:21.437614
Analysis finished2021-02-18 21:32:17.040240
Duration1 minute and 55.6 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

id_zone
Real number (ℝ≥0)

Distinct2760
Distinct (%)7.8%
Missing54
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean1893.15322
Minimum0
Maximum2759
Zeros1
Zeros (%)< 0.1%
Memory size277.0 KiB
2021-02-18T22:32:17.114644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1447.3
Q11721
median1880
Q32087
95-th percentile2473
Maximum2759
Range2759
Interquartile range (IQR)366

Descriptive statistics

Standard deviation379.0626613
Coefficient of variation (CV)0.2002281998
Kurtosis5.921999223
Mean1893.15322
Median Absolute Deviation (MAD)176
Skewness-1.528489725
Sum66993013
Variance143688.5012
MonotocityNot monotonic
2021-02-18T22:32:17.227013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1699363
 
1.0%
1673242
 
0.7%
1670229
 
0.6%
1990216
 
0.6%
1680207
 
0.6%
1661190
 
0.5%
1933184
 
0.5%
1960183
 
0.5%
1669182
 
0.5%
1922180
 
0.5%
Other values (2750)33211
93.7%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
ValueCountFrequency (%)
27592
< 0.1%
27582
< 0.1%
27572
< 0.1%
27562
< 0.1%
27553
< 0.1%

INSEE
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct35441
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
21039
 
1
60123
 
1
60058
 
1
33338
 
1
50276
 
1
Other values (35436)
35436 

Length

Max length5
Median length5
Mean length4.910301628
Min length4

Characters and Unicode

Total characters174026
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35441 ?
Unique (%)100.0%

Sample

1st row1001
2nd row1002
3rd row1004
4th row1005
5th row1006
ValueCountFrequency (%)
210391
 
< 0.1%
601231
 
< 0.1%
600581
 
< 0.1%
333381
 
< 0.1%
502761
 
< 0.1%
871441
 
< 0.1%
672521
 
< 0.1%
315141
 
< 0.1%
451301
 
< 0.1%
360261
 
< 0.1%
Other values (35431)35431
> 99.9%
2021-02-18T22:32:17.569497image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
210391
 
< 0.1%
644091
 
< 0.1%
412861
 
< 0.1%
172931
 
< 0.1%
570081
 
< 0.1%
515471
 
< 0.1%
301451
 
< 0.1%
360261
 
< 0.1%
371271
 
< 0.1%
892161
 
< 0.1%
Other values (35431)35431
> 99.9%

Most occurring characters

ValueCountFrequency (%)
123426
13.5%
222955
13.2%
019434
11.2%
318843
10.8%
517189
9.9%
416624
9.6%
615855
9.1%
715220
8.7%
813474
7.7%
910646
6.1%
Other values (2)360
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number173666
99.8%
Uppercase Letter360
 
0.2%

Most frequent character per category

ValueCountFrequency (%)
123426
13.5%
222955
13.2%
019434
11.2%
318843
10.9%
517189
9.9%
416624
9.6%
615855
9.1%
715220
8.8%
813474
7.8%
910646
6.1%
ValueCountFrequency (%)
B236
65.6%
A124
34.4%

Most occurring scripts

ValueCountFrequency (%)
Common173666
99.8%
Latin360
 
0.2%

Most frequent character per script

ValueCountFrequency (%)
123426
13.5%
222955
13.2%
019434
11.2%
318843
10.9%
517189
9.9%
416624
9.6%
615855
9.1%
715220
8.8%
813474
7.8%
910646
6.1%
ValueCountFrequency (%)
B236
65.6%
A124
34.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII174026
100.0%

Most frequent character per block

ValueCountFrequency (%)
123426
13.5%
222955
13.2%
019434
11.2%
318843
10.8%
517189
9.9%
416624
9.6%
615855
9.1%
715220
8.7%
813474
7.7%
910646
6.1%
Other values (2)360
 
0.2%

LIBGEO
Categorical

HIGH CARDINALITY
UNIFORM

Distinct33082
Distinct (%)93.3%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
Sainte-Colombe
 
13
Saint-Sauveur
 
11
Sainte-Marie
 
11
Saint-Aubin
 
10
Saint-Loup
 
10
Other values (33077)
35386 

Length

Max length45
Median length10
Mean length11.77145114
Min length1

Characters and Unicode

Total characters417192
Distinct characters83
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31578 ?
Unique (%)89.1%

Sample

1st rowL'Abergement-Clémenciat
2nd rowL'Abergement-de-Varey
3rd rowAmbérieu-en-Bugey
4th rowAmbérieux-en-Dombes
5th rowAmbléon
ValueCountFrequency (%)
Sainte-Colombe13
 
< 0.1%
Saint-Sauveur11
 
< 0.1%
Sainte-Marie11
 
< 0.1%
Saint-Aubin10
 
< 0.1%
Saint-Loup10
 
< 0.1%
Beaulieu10
 
< 0.1%
Saint-Michel9
 
< 0.1%
Saint-Marcel9
 
< 0.1%
Saint-Paul9
 
< 0.1%
Saint-Hilaire9
 
< 0.1%
Other values (33072)35340
99.7%
2021-02-18T22:32:17.897440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
la1081
 
2.8%
le773
 
2.0%
les340
 
0.9%
arrondissement45
 
0.1%
de31
 
0.1%
val29
 
0.1%
en28
 
0.1%
paris20
 
0.1%
sur18
 
< 0.1%
marseille16
 
< 0.1%
Other values (32967)35711
93.7%

Most occurring characters

ValueCountFrequency (%)
e44495
 
10.7%
a33137
 
7.9%
n30091
 
7.2%
i27676
 
6.6%
r27221
 
6.5%
-25294
 
6.1%
l24403
 
5.8%
s22424
 
5.4%
o21039
 
5.0%
u20392
 
4.9%
Other values (73)141020
33.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter333226
79.9%
Uppercase Letter55108
 
13.2%
Dash Punctuation25294
 
6.1%
Space Separator2651
 
0.6%
Other Punctuation838
 
0.2%
Decimal Number63
 
< 0.1%
Open Punctuation6
 
< 0.1%
Close Punctuation6
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e44495
13.4%
a33137
9.9%
n30091
9.0%
i27676
8.3%
r27221
8.2%
l24403
 
7.3%
s22424
 
6.7%
o21039
 
6.3%
u20392
 
6.1%
t18101
 
5.4%
Other values (29)64247
19.3%
ValueCountFrequency (%)
S7417
13.5%
C5941
10.8%
L5541
10.1%
M5400
9.8%
B5134
9.3%
A3350
 
6.1%
V3293
 
6.0%
P3173
 
5.8%
G2248
 
4.1%
R2071
 
3.8%
Other values (19)11540
20.9%
ValueCountFrequency (%)
122
34.9%
26
 
9.5%
35
 
7.9%
45
 
7.9%
55
 
7.9%
65
 
7.9%
74
 
6.3%
84
 
6.3%
94
 
6.3%
03
 
4.8%
ValueCountFrequency (%)
'838
100.0%
ValueCountFrequency (%)
-25294
100.0%
ValueCountFrequency (%)
2651
100.0%
ValueCountFrequency (%)
(6
100.0%
ValueCountFrequency (%)
)6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin388334
93.1%
Common28858
 
6.9%

Most frequent character per script

ValueCountFrequency (%)
e44495
 
11.5%
a33137
 
8.5%
n30091
 
7.7%
i27676
 
7.1%
r27221
 
7.0%
l24403
 
6.3%
s22424
 
5.8%
o21039
 
5.4%
u20392
 
5.3%
t18101
 
4.7%
Other values (58)119355
30.7%
ValueCountFrequency (%)
-25294
87.6%
2651
 
9.2%
'838
 
2.9%
122
 
0.1%
26
 
< 0.1%
(6
 
< 0.1%
)6
 
< 0.1%
35
 
< 0.1%
45
 
< 0.1%
55
 
< 0.1%
Other values (5)20
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII408789
98.0%
None8403
 
2.0%

Most frequent character per block

ValueCountFrequency (%)
e44495
 
10.9%
a33137
 
8.1%
n30091
 
7.4%
i27676
 
6.8%
r27221
 
6.7%
-25294
 
6.2%
l24403
 
6.0%
s22424
 
5.5%
o21039
 
5.1%
u20392
 
5.0%
Other values (57)132617
32.4%
ValueCountFrequency (%)
é4237
50.4%
è2351
28.0%
É642
 
7.6%
â451
 
5.4%
ô184
 
2.2%
ê156
 
1.9%
ë123
 
1.5%
ç109
 
1.3%
î40
 
0.5%
û34
 
0.4%
Other values (6)76
 
0.9%

DEP
Categorical

HIGH CARDINALITY

Distinct100
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
62
 
891
2
 
804
80
 
779
57
 
727
76
 
711
Other values (95)
31529 

Length

Max length3
Median length2
Mean length1.91346181
Min length1

Characters and Unicode

Total characters67815
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
62891
 
2.5%
2804
 
2.3%
80779
 
2.2%
57727
 
2.1%
76711
 
2.0%
21704
 
2.0%
60687
 
1.9%
59648
 
1.8%
51616
 
1.7%
27602
 
1.7%
Other values (90)28272
79.8%
2021-02-18T22:32:18.127535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
62891
 
2.5%
2804
 
2.3%
80779
 
2.2%
57727
 
2.1%
76711
 
2.0%
21704
 
2.0%
60687
 
1.9%
59648
 
1.8%
51616
 
1.7%
27602
 
1.7%
Other values (90)28272
79.8%

Most occurring characters

ValueCountFrequency (%)
28576
12.6%
18032
11.8%
57911
11.7%
77900
11.6%
67794
11.5%
36901
10.2%
86513
9.6%
46164
9.1%
93963
5.8%
03701
5.5%
Other values (2)360
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number67455
99.5%
Uppercase Letter360
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
28576
12.7%
18032
11.9%
57911
11.7%
77900
11.7%
67794
11.6%
36901
10.2%
86513
9.7%
46164
9.1%
93963
5.9%
03701
5.5%
ValueCountFrequency (%)
B236
65.6%
A124
34.4%

Most occurring scripts

ValueCountFrequency (%)
Common67455
99.5%
Latin360
 
0.5%

Most frequent character per script

ValueCountFrequency (%)
28576
12.7%
18032
11.9%
57911
11.7%
77900
11.7%
67794
11.6%
36901
10.2%
86513
9.7%
46164
9.1%
93963
5.9%
03701
5.5%
ValueCountFrequency (%)
B236
65.6%
A124
34.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII67815
100.0%

Most frequent character per block

ValueCountFrequency (%)
28576
12.6%
18032
11.8%
57911
11.7%
77900
11.6%
67794
11.5%
36901
10.2%
86513
9.6%
46164
9.1%
93963
5.8%
03701
5.5%
Other values (2)360
 
0.5%

REG
Real number (ℝ≥0)

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.33218589
Minimum1
Maximum94
Zeros0
Zeros (%)0.0%
Memory size277.0 KiB
2021-02-18T22:32:18.213483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile24
Q128
median44
Q376
95-th percentile84
Maximum94
Range93
Interquartile range (IQR)48

Descriptive statistics

Standard deviation24.33488505
Coefficient of variation (CV)0.4650079991
Kurtosis-1.452736214
Mean52.33218589
Median Absolute Deviation (MAD)17
Skewness0.1001877502
Sum1854705
Variance592.1866302
MonotocityNot monotonic
2021-02-18T22:32:18.304132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
445136
14.5%
764488
12.7%
754413
12.5%
844103
11.6%
323809
10.7%
273739
10.5%
282722
7.7%
241783
 
5.0%
111296
 
3.7%
521281
 
3.6%
Other values (7)2671
7.5%
ValueCountFrequency (%)
132
 
0.1%
234
 
0.1%
322
 
0.1%
424
 
0.1%
111296
3.7%
ValueCountFrequency (%)
94360
 
1.0%
93966
 
2.7%
844103
11.6%
764488
12.7%
754413
12.5%

EPCI
Categorical

HIGH CARDINALITY

Distinct1263
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
200067106
 
158
200054781
 
150
200067213
 
143
200067205
 
132
200041523
 
129
Other values (1258)
34729 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters318969
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row200069193
2nd row240100883
3rd row240100883
4th row200042497
5th row200040350
ValueCountFrequency (%)
200067106158
 
0.4%
200054781150
 
0.4%
200067213143
 
0.4%
200067205132
 
0.4%
200041523129
 
0.4%
245701206128
 
0.4%
200071181120
 
0.3%
200041689111
 
0.3%
200054807107
 
0.3%
242101434107
 
0.3%
Other values (1253)34156
96.4%
2021-02-18T22:32:18.547963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
200067106158
 
0.4%
200054781150
 
0.4%
200067213143
 
0.4%
200067205132
 
0.4%
200041523129
 
0.4%
245701206128
 
0.4%
200071181120
 
0.3%
200041689111
 
0.3%
200054807107
 
0.3%
242101434107
 
0.3%
Other values (1253)34156
96.4%

Most occurring characters

ValueCountFrequency (%)
0111600
35.0%
250216
15.7%
429029
 
9.1%
625711
 
8.1%
723352
 
7.3%
317699
 
5.5%
117438
 
5.5%
515584
 
4.9%
914449
 
4.5%
813846
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number318924
> 99.9%
Uppercase Letter45
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
0111600
35.0%
250216
15.7%
429029
 
9.1%
625711
 
8.1%
723352
 
7.3%
317699
 
5.5%
117438
 
5.5%
515584
 
4.9%
914449
 
4.5%
813846
 
4.3%
ValueCountFrequency (%)
Z45
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common318924
> 99.9%
Latin45
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
0111600
35.0%
250216
15.7%
429029
 
9.1%
625711
 
8.1%
723352
 
7.3%
317699
 
5.5%
117438
 
5.5%
515584
 
4.9%
914449
 
4.5%
813846
 
4.3%
ValueCountFrequency (%)
Z45
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII318969
100.0%

Most frequent character per block

ValueCountFrequency (%)
0111600
35.0%
250216
15.7%
429029
 
9.1%
625711
 
8.1%
723352
 
7.3%
317699
 
5.5%
117438
 
5.5%
515584
 
4.9%
914449
 
4.5%
813846
 
4.3%

TYPPRED
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
maille
22713 
epci
8769 
commune
3959 

Length

Max length7
Median length6
Mean length5.616856184
Min length4

Characters and Unicode

Total characters199067
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmaille
2nd rowmaille
3rd rowcommune
4th rowmaille
5th rowmaille
ValueCountFrequency (%)
maille22713
64.1%
epci8769
 
24.7%
commune3959
 
11.2%
2021-02-18T22:32:18.760514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-18T22:32:18.816378image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
maille22713
64.1%
epci8769
 
24.7%
commune3959
 
11.2%

Most occurring characters

ValueCountFrequency (%)
l45426
22.8%
e35441
17.8%
i31482
15.8%
m30631
15.4%
a22713
11.4%
c12728
 
6.4%
p8769
 
4.4%
o3959
 
2.0%
u3959
 
2.0%
n3959
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter199067
100.0%

Most frequent character per category

ValueCountFrequency (%)
l45426
22.8%
e35441
17.8%
i31482
15.8%
m30631
15.4%
a22713
11.4%
c12728
 
6.4%
p8769
 
4.4%
o3959
 
2.0%
u3959
 
2.0%
n3959
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Latin199067
100.0%

Most frequent character per script

ValueCountFrequency (%)
l45426
22.8%
e35441
17.8%
i31482
15.8%
m30631
15.4%
a22713
11.4%
c12728
 
6.4%
p8769
 
4.4%
o3959
 
2.0%
u3959
 
2.0%
n3959
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII199067
100.0%

Most frequent character per block

ValueCountFrequency (%)
l45426
22.8%
e35441
17.8%
i31482
15.8%
m30631
15.4%
a22713
11.4%
c12728
 
6.4%
p8769
 
4.4%
o3959
 
2.0%
u3959
 
2.0%
n3959
 
2.0%

loypredm2
Categorical

HIGH CARDINALITY

Distinct5564
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
7,243380942
 
198
7,496207877
 
163
7,512880963
 
139
8,753956429
 
133
9,300040138
 
125
Other values (5559)
34683 

Length

Max length11
Median length11
Mean length10.89579865
Min length7

Characters and Unicode

Total characters386158
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4138 ?
Unique (%)11.7%

Sample

1st row9,372335348
2nd row8,63555202
3rd row10,07450708
4th row9,372335348
5th row8,96695486
ValueCountFrequency (%)
7,243380942198
 
0.6%
7,496207877163
 
0.5%
7,512880963139
 
0.4%
8,753956429133
 
0.4%
9,300040138125
 
0.4%
9,878471129120
 
0.3%
7,245641948120
 
0.3%
7,870065337114
 
0.3%
9,236952347108
 
0.3%
8,330224166106
 
0.3%
Other values (5554)34115
96.3%
2021-02-18T22:32:19.053852image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7,243380942198
 
0.6%
7,496207877163
 
0.5%
7,512880963139
 
0.4%
8,753956429133
 
0.4%
9,300040138125
 
0.4%
9,878471129120
 
0.3%
7,245641948120
 
0.3%
7,870065337114
 
0.3%
9,236952347108
 
0.3%
8,330224166106
 
0.3%
Other values (5554)34115
96.3%

Most occurring characters

ValueCountFrequency (%)
741586
10.8%
840973
10.6%
140886
10.6%
937328
9.7%
,35441
9.2%
233510
8.7%
632484
8.4%
332061
8.3%
030868
8.0%
530662
7.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number350717
90.8%
Other Punctuation35441
 
9.2%

Most frequent character per category

ValueCountFrequency (%)
741586
11.9%
840973
11.7%
140886
11.7%
937328
10.6%
233510
9.6%
632484
9.3%
332061
9.1%
030868
8.8%
530662
8.7%
430359
8.7%
ValueCountFrequency (%)
,35441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common386158
100.0%

Most frequent character per script

ValueCountFrequency (%)
741586
10.8%
840973
10.6%
140886
10.6%
937328
9.7%
,35441
9.2%
233510
8.7%
632484
8.4%
332061
8.3%
030868
8.0%
530662
7.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII386158
100.0%

Most frequent character per block

ValueCountFrequency (%)
741586
10.8%
840973
10.6%
140886
10.6%
937328
9.7%
,35441
9.2%
233510
8.7%
632484
8.4%
332061
8.3%
030868
8.0%
530662
7.9%

lwr.IPm2
Categorical

HIGH CARDINALITY

Distinct5564
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
5,516260102
 
198
5,793232797
 
163
5,520493404
 
139
6,789188203
 
133
7,10826552
 
125
Other values (5559)
34683 

Length

Max length11
Median length11
Mean length10.89032477
Min length8

Characters and Unicode

Total characters385964
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4138 ?
Unique (%)11.7%

Sample

1st row7,409663525
2nd row6,72107751
3rd row7,865307059
4th row7,409663525
5th row7,101593346
ValueCountFrequency (%)
5,516260102198
 
0.6%
5,793232797163
 
0.5%
5,520493404139
 
0.4%
6,789188203133
 
0.4%
7,10826552125
 
0.4%
5,391853114120
 
0.3%
6,910446411120
 
0.3%
5,993167722114
 
0.3%
7,553745842108
 
0.3%
6,148226304106
 
0.3%
Other values (5554)34115
96.3%
2021-02-18T22:32:19.366862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
5,516260102198
 
0.6%
5,793232797163
 
0.5%
5,520493404139
 
0.4%
6,789188203133
 
0.4%
7,10826552125
 
0.4%
5,391853114120
 
0.3%
6,910446411120
 
0.3%
5,993167722114
 
0.3%
7,553745842108
 
0.3%
6,148226304106
 
0.3%
Other values (5554)34115
96.3%

Most occurring characters

ValueCountFrequency (%)
642192
10.9%
540091
10.4%
738615
10.0%
836780
9.5%
,35441
9.2%
334537
8.9%
133612
8.7%
232521
8.4%
932430
8.4%
431805
8.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number350523
90.8%
Other Punctuation35441
 
9.2%

Most frequent character per category

ValueCountFrequency (%)
642192
12.0%
540091
11.4%
738615
11.0%
836780
10.5%
334537
9.9%
133612
9.6%
232521
9.3%
932430
9.3%
431805
9.1%
027940
8.0%
ValueCountFrequency (%)
,35441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common385964
100.0%

Most frequent character per script

ValueCountFrequency (%)
642192
10.9%
540091
10.4%
738615
10.0%
836780
9.5%
,35441
9.2%
334537
8.9%
133612
8.7%
232521
8.4%
932430
8.4%
431805
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII385964
100.0%

Most frequent character per block

ValueCountFrequency (%)
642192
10.9%
540091
10.4%
738615
10.0%
836780
9.5%
,35441
9.2%
334537
8.9%
133612
8.7%
232521
8.4%
932430
8.4%
431805
8.2%

upr.IPm2
Categorical

HIGH CARDINALITY

Distinct5564
Distinct (%)15.7%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
9,511256993
 
198
9,699788444
 
163
10,22433617
 
139
11,28732197
 
133
12,16763025
 
125
Other values (5559)
34683 

Length

Max length11
Median length11
Mean length10.88764425
Min length8

Characters and Unicode

Total characters385869
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4138 ?
Unique (%)11.7%

Sample

1st row11,85487972
2nd row11,09535764
3rd row12,90422511
4th row11,85487972
5th row11,3222872
ValueCountFrequency (%)
9,511256993198
 
0.6%
9,699788444163
 
0.5%
10,22433617139
 
0.4%
11,28732197133
 
0.4%
12,16763025125
 
0.4%
14,1212573120
 
0.3%
9,736787358120
 
0.3%
10,33475639114
 
0.3%
11,29522894108
 
0.3%
10,81769523106
 
0.3%
Other values (5554)34115
96.3%
2021-02-18T22:32:19.636075image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
9,511256993198
 
0.6%
9,699788444163
 
0.5%
10,22433617139
 
0.4%
11,28732197133
 
0.4%
12,16763025125
 
0.4%
14,1212573120
 
0.3%
9,736787358120
 
0.3%
10,33475639114
 
0.3%
11,29522894108
 
0.3%
10,81769523106
 
0.3%
Other values (5554)34115
96.3%

Most occurring characters

ValueCountFrequency (%)
160164
15.6%
937909
9.8%
,35441
9.2%
234915
9.0%
033040
8.6%
331954
8.3%
831910
8.3%
630950
8.0%
430512
7.9%
529924
7.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number350428
90.8%
Other Punctuation35441
 
9.2%

Most frequent character per category

ValueCountFrequency (%)
160164
17.2%
937909
10.8%
234915
10.0%
033040
9.4%
331954
9.1%
831910
9.1%
630950
8.8%
430512
8.7%
529924
8.5%
729150
8.3%
ValueCountFrequency (%)
,35441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common385869
100.0%

Most frequent character per script

ValueCountFrequency (%)
160164
15.6%
937909
9.8%
,35441
9.2%
234915
9.0%
033040
8.6%
331954
8.3%
831910
8.3%
630950
8.0%
430512
7.9%
529924
7.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII385869
100.0%

Most frequent character per block

ValueCountFrequency (%)
160164
15.6%
937909
9.8%
,35441
9.2%
234915
9.0%
033040
8.6%
331954
8.3%
831910
8.3%
630950
8.0%
430512
7.9%
529924
7.8%

R2adj
Categorical

HIGH CARDINALITY

Distinct2776
Distinct (%)7.8%
Missing0
Missing (%)0.0%
Memory size277.0 KiB
0,593436021
 
363
0,609797979
 
242
0,682352094
 
229
0,712818156
 
216
0,694248084
 
207
Other values (2771)
34184 

Length

Max length11
Median length11
Mean length10.89469823
Min length8

Characters and Unicode

Total characters386119
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1382 ?
Unique (%)3.9%

Sample

1st row0,774249278
2nd row0,719598662
3rd row0,670160882
4th row0,774249278
5th row0,752823051
ValueCountFrequency (%)
0,593436021363
 
1.0%
0,609797979242
 
0.7%
0,682352094229
 
0.6%
0,712818156216
 
0.6%
0,694248084207
 
0.6%
0,496968747190
 
0.5%
0,621223154184
 
0.5%
0,641613615183
 
0.5%
0,583880646182
 
0.5%
0,657914305180
 
0.5%
Other values (2766)33265
93.9%
2021-02-18T22:32:19.920492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0,593436021363
 
1.0%
0,609797979242
 
0.7%
0,682352094229
 
0.6%
0,712818156216
 
0.6%
0,694248084207
 
0.6%
0,496968747190
 
0.5%
0,621223154184
 
0.5%
0,641613615183
 
0.5%
0,583880646182
 
0.5%
0,657914305180
 
0.5%
Other values (2766)33265
93.9%

Most occurring characters

ValueCountFrequency (%)
059255
15.3%
740775
10.6%
640622
10.5%
,35441
9.2%
834250
8.9%
930905
8.0%
529921
7.7%
129841
7.7%
429375
7.6%
327998
7.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number350678
90.8%
Other Punctuation35441
 
9.2%

Most frequent character per category

ValueCountFrequency (%)
059255
16.9%
740775
11.6%
640622
11.6%
834250
9.8%
930905
8.8%
529921
8.5%
129841
8.5%
429375
8.4%
327998
8.0%
227736
7.9%
ValueCountFrequency (%)
,35441
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common386119
100.0%

Most frequent character per script

ValueCountFrequency (%)
059255
15.3%
740775
10.6%
640622
10.5%
,35441
9.2%
834250
8.9%
930905
8.0%
529921
7.7%
129841
7.7%
429375
7.6%
327998
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII386119
100.0%

Most frequent character per block

ValueCountFrequency (%)
059255
15.3%
740775
10.6%
640622
10.5%
,35441
9.2%
834250
8.9%
930905
8.0%
529921
7.7%
129841
7.7%
429375
7.6%
327998
7.3%

NBobs_maille
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1446
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1054.792782
Minimum467
Maximum200840
Zeros0
Zeros (%)0.0%
Memory size277.0 KiB
2021-02-18T22:32:20.021538image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum467
5-th percentile510
Q1655
median853
Q31086
95-th percentile1774
Maximum200840
Range200373
Interquartile range (IQR)431

Descriptive statistics

Standard deviation2298.80593
Coefficient of variation (CV)2.179391032
Kurtosis2253.286977
Mean1054.792782
Median Absolute Deviation (MAD)207
Skewness37.86081434
Sum37382911
Variance5284508.704
MonotocityNot monotonic
2021-02-18T22:32:20.135643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1048363
 
1.0%
667265
 
0.7%
697249
 
0.7%
1774242
 
0.7%
903233
 
0.7%
979229
 
0.6%
938219
 
0.6%
1206217
 
0.6%
861200
 
0.6%
1139198
 
0.6%
Other values (1436)33026
93.2%
ValueCountFrequency (%)
46719
 
0.1%
46914
 
< 0.1%
4704
 
< 0.1%
47295
0.3%
4733
 
< 0.1%
ValueCountFrequency (%)
2008401
< 0.1%
1178741
< 0.1%
1094971
< 0.1%
825481
< 0.1%
782981
< 0.1%

NBobs_commune
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1694
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean204.3819587
Minimum0
Maximum200840
Zeros11142
Zeros (%)31.4%
Memory size277.0 KiB
2021-02-18T22:32:20.254636image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q323
95-th percentile428
Maximum200840
Range200840
Interquartile range (IQR)23

Descriptive statistics

Standard deviation2315.567062
Coefficient of variation (CV)11.32960598
Kurtosis2241.832393
Mean204.3819587
Median Absolute Deviation (MAD)4
Skewness37.96537698
Sum7243501
Variance5361850.817
MonotocityNot monotonic
2021-02-18T22:32:20.367771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
011142
31.4%
22301
 
6.5%
12081
 
5.9%
31618
 
4.6%
41315
 
3.7%
51067
 
3.0%
6938
 
2.6%
7732
 
2.1%
8618
 
1.7%
9591
 
1.7%
Other values (1684)13038
36.8%
ValueCountFrequency (%)
011142
31.4%
12081
 
5.9%
22301
 
6.5%
31618
 
4.6%
41315
 
3.7%
ValueCountFrequency (%)
2008401
< 0.1%
1178741
< 0.1%
1094971
< 0.1%
825481
< 0.1%
782981
< 0.1%

Interactions

2021-02-18T22:30:25.308298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:30:48.480878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:31:13.601370image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:31:41.577999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:31:53.006735image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:31:53.113013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:31:53.225021image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:32:04.422118image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:32:04.528953image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:32:04.659716image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:32:15.559930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-18T22:32:15.657708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-18T22:32:20.461535image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-18T22:32:20.553044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-18T22:32:20.643328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-18T22:32:20.740187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-18T22:32:20.834989image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-18T22:32:16.392657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-18T22:32:16.720368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-18T22:32:16.905794image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

id_zoneINSEELIBGEODEPREGEPCITYPPREDloypredm2lwr.IPm2upr.IPm2R2adjNBobs_mailleNBobs_commune
023621001L'Abergement-Clémenciat184200069193maille9,3723353487,40966352511,854879720,7742492787012
120101002L'Abergement-de-Varey184240100883maille8,635552026,7210775111,095357640,7195986628610
214561004Ambérieu-en-Bugey184240100883commune10,074507087,86530705912,904225110,67016088221562156
323621005Ambérieux-en-Dombes184200042497maille9,3723353487,40966352511,854879720,77424927870181
420551006Ambléon184200040350maille8,966954867,10159334611,32228720,7528230518423
520101007Ambronay184240100883commune9,0392359087,02031618911,63876150,719598662861104
620101008Ambutrix184240100883maille8,635552026,7210775111,095357640,71959866286112
723291009Andert-et-Condon184200040350maille8,8990906386,73887533911,751785010,7270209716555
821121010Anglefort184200070852maille11,172663938,78475394214,209665990,79699942959728
926731011Apremont184200042935maille8,9314849127,04298510111,32636540,8285648564870

Last rows

id_zoneINSEELIBGEODEPREGEPCITYPPREDloypredm2lwr.IPm2upr.IPm2R2adjNBobs_mailleNBobs_commune
35431160697415Saint-Paul9744249740101commune15,3793117610,2033484923,180942050,44757953450595059
35432160997416Saint-Pierre9744249740077commune12,430157158,86913450217,420956540,64719473547444509
35433161497417Saint-Philippe9744249740085maille9,9678831377,88960718312,593617390,72321902174524
35434161697418Sainte-Marie9744249740119commune11,693313998,8761981515,404522280,737964684854854
35435161497419Sainte-Rose9744249740093maille9,9678831377,88960718312,593617390,72321902174516
35436180297420Sainte-Suzanne9744249740119commune10,673569688,79153018912,958505210,788143839847411
35437180297421Salazie9744249740093maille10,387459388,54519170912,626903640,78814383984711
35438160897422Le Tampon9744249740085commune10,815484888,69633280213,451039180,8154140532373237
35439160797423Les Trois-Bassins9744249740101maille12,343159718,59780240517,720061990,460731322166990
35440160797424Cilaos9744249740077maille12,343159718,59780240517,720061990,460731322166940